-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch from XZ to ZSTD and improve release artifact transparency #343
Conversation
I've always preferred ZSTD do XZ anyway, but the default Ubuntu runner GitHub Actions used to use was so old that it did not have a sufficiently new autotools release to have ZSTD support. Hence XZ was always a compromise to get better compression without compatibility problems. Times have changed, the upstream `xz-utils` project has been backdoored, and ZSTD is available in Ubuntu LTS releases so we don't have a problem auto-generating releases any more. As-yet no evidence has surfaced that XZ compressed artifacts are compromised, but since there are no blockers anyway switching seems like a good plan to me.
7b06630
to
2e2e5a7
Compare
I was playing around with reproducibility with this and ran into an interesting issue. I wanted to verify that the standalone file being generated in CI was bit-for-bit identical with one made locally so it could be audited if desired. The upshot is it is reproducible, but only on the default branch. Verifying a PR is somewhat harder because Github is doing a merge action before running the tests and testing the theoretial result of the PR being merged, not just the branch. That means the artifacts generated in the workflows show here will have different --- vcsh-standalone.sh 2024-04-03 14:58:31.815280296 +0300
+++ /home/caleb/Downloads/vcsh/vcsh-standalone.sh 2024-04-03 11:25:10.000000000 +0300
@@ -15,7 +15,7 @@
# version built from the main branch HEAD. If "-standalone" is appended you are
# using a version built explicitly for portability as a standalone script
# rather than being installed to a specific system.
-VCSH_VERSION='2.0.8.r3-g2e2e5a7-standalone'; export VCSH_VERSION
+VCSH_VERSION='2.0.8.r4-gd416d7d-standalone'; export VCSH_VERSION
VCSH_SELF='vcsh-standalone.sh'; export VCSH_SELF
# Ensure all files created are accessible only to the current user. The same workflow will run on the branch pushed by the the PR contributor, but it will only be accessible by looking at the workflow runs on the branch on their fork. I think this is reasonable behavior, it just caught be a little off guard. And again the upstream is that artifacts on every commit to main and of course release tags should be reproducible. |
One of the major complains surrounding the recent XZ fiasco is that auto-tools generated source files include so much obtuse code that they are difficult to audit. Also it isn't immediately apparent what sources they have been generated from. In our case we're generating the source dist files in CI anyway and automatically attaching them to releases, but GH does not make it possible to verify this. They could just as well be reposted later by a malicious maintainer. This is not a magic bullet to fix all that, but it should help. The CI environment can be verified by looking at the workflow file and the other Git sources so we're not using a modified version of autotools or anything like that. Checksums are now being generated after making the distribution tarballs, and *echoed to the output log* so it is possible to verify that the files generated in CI are actually still the ones attached to the release. The checksums file is also posted to the release.
0433c1e
to
735660b
Compare
We're extending GNU/Autotools with our own macros. This is allowed and even expected, but there are certainly more and less idiomatic ways to do so. This normalizes all our extended bits to be as idiomatic as possible to make auditing easier. * Filename prefixes are more representative of their source: especially we are not hijacking the ax_ prefix which usually references something from autotools-archive that could be compared with a known source. All files starting with ax_ could be removed from this repository and the project still build provided autotools-archive is available on the host system. We're still vendoring them because it is not as broadly available as autotools, but these bits don't need to be audited as part of this project. I used que_ as a prefix for custom macros to make them easy to find and because I've authored them and share identical extensions across a number of other projects. The configure macros and correstponding makefile fragments have matching prefixes so they sort together. * The AMINCLUDE system is leveraged instead of making the maintainer hand-rol includes for each fragment. This is actually probably slightly *harder* to grok, but again the macros used are standardized in autoconf-archive.) I think the more idiomatic usage has the payoff of not sounding any more alarms than we need to. The gotcha was that we do still want to inline them, which means a bit of extra bootstrapping. The macro include system also has the advantage of a single place to enable/disable a component of the build instead of needing to comment out both the configure.ac macro(s) and the matching Makefile.am include(s) separately.
1158bb0
to
9720336
Compare
Reverting this commit should remove the macro code. The project would still be buildable on any system with autoconf-archive, but since that is considerably less than systems with autoconf which is our actual target, we're vendoring a copy of them here. These files should be bit-for-bit the same as what appears in current macro archives.
Switch from xz to zst compression for official artifacts
I've always preferred ZSTD do XZ anyway, but the default Ubuntu runner GitHub Actions used to use was so old that it did not have a sufficiently new autotools release to have ZSTD support. Hence XZ was always a compromise to get better compression without compatibility problems. Times have changed, the upstream
xz-utils
project has been backdoored, and ZSTD is available in Ubuntu LTS releases so we don't have a problem auto-generating releases any more.As-yet no evidence has surfaced that XZ compressed artifacts are compromised, but since there are no blockers anyway switching seems like a good plan to me.
Generate checksums for release artifacts in CI
One of the major complains surrounding the recent XZ fiasco is that auto-tools generated source files include so much obtuse code that they are difficult to audit. Also it isn't immediately apparent what sources they have been generated from. In our case we're generating the source dist files in CI anyway and automatically attaching them to releases, but GH does not make it possible to verify this. They could just as well be reposted later by a malicious maintainer.
This is not a magic bullet to fix all that, but it should help. The CI environment can be verified by looking at the workflow file and the other Git sources so we're not using a modified version of autotools or anything like that. Checksums are now being generated after making the distribution tarballs, and echoed to the output log so it is possible to verify that the files generated in CI are actually still the ones attached to the release. The checksums file is also posted to the release.
Guard checksum generation (and the extra dependencies) behind maintainer mode
Use more GNU/Autotools idiomatic names and methods
We're extending GNU/Autotools with our own macros. This is allowed and even expected, but there are certainly more and less idiomatic ways to do so. This normalizes all our extended bits to be as idiomatic as possible to make auditing easier.
Filename prefixes are more representative of their source: especially we are not hijacking the ax_ prefix which usually references something from autotools-archive that could be compared with a known source. All files starting with ax_ could be removed from the repository and the project still built provided autotools-archive was available on the host system. We're still vendoring them because it is not as broadly available as autotools, but these bits don't need to be audited as part of this project.
I used que_ as a prefix to make them easy to find and because I've authored them and share identical extensions across a number of other projects. The configure macros and makefile extensions have matching prefixes so they sort together.
The AMINCLUDE system is leveraged instead of hand rolling includes. This is actually probably slightly harder to grok, but again the macros used are standardized in autoconf-archive.) I think the more idiomatic usage has the payoff of not sounding any more alarms than we need to.
It also has the advantage of a single place to enable/disable a component of the build instead of needing to comment out both the configure macro(s) and makefile include(s) separately.
Include macros from autoconf-archive (pristine)
Reverting this commit should remove the macro code. The project would still be buildable on any system with autoconf-archive, but since that is considerably less than systems with autoconf which is our actual target, we're vendoring a copy of them here. These files should be bit-for-bit the same as what appears in current macro archives.
Also just a reminder if you still don't trust autotools generated tarballs (and let's be honest, there is a lot of obtuse code in them!) you can use your own system instead of ours to generate all the intermediate artifacts. The
git archive
generated sources or a Git clone are also usable, but you need to run./botostrap.sh
(or even justautoreconf --install
) yourself before being able to configure.